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I. INTRODUCTION 

the development cf computer and information processing 
has ccme to the stage of being able to handle Kcrean ard 
Chinese character inpfut and output. There is no problem in 
information systems for the input and output of characters 
from a standard Roman character keyboard, but the protlems 
related tc non-Roman characters from I/O to software prob- 
lems of language handling remain aimost unsolved. eae 
recently the computer could not handle Korean or Chinese 
characters efficiently. It was not user friendly and data 
processing in Korea waS imperfect and very unwieldv. Among 
the problems, the biggest issue is how to enter 2,369 Korean 
and 1,890 common Chinese characters from the standard Rogan 
character keyboard. 

During the last few years, there have been great efforts 
at universities, research institutes and manufacturers for 
the development of good I/O devices for Korean characters. 
In Korea, hatural language processing, . especially Korean 
language processing, is one of the essential elements for 
the future of computer and information systems. 

First the properties of Korean and Chinese characters 
will ke presented as an introduction for those unfamiliar 
with these characters. Then, the resolution power of CRT'*s 
and dot matrix printers and their relation to the shape 
Gig@radererrscies (readability, asthetic quality, etc.) of 
Korean and Chinese characters will be discussed. The 
methods which are developed for Korean and Chinese character 
I/O can fe applied tec other character sets, especially to 
Many ncen-Roman alphaketic character sets, not to mention 


Chinese characters in China. 


‘O 


It. BACKGROUND 


A. PROPERTIES OF KOREAN DOCUMENTS 


Commen documents in Korea are usually written ina mixed 


form utilizing Korean and Chinese characters. Minor use is 
made cf Reman script. The usage of each character set 
depends on the kind cf document. In order to perform word 


processing efficiently in Korea, the simultaneous editing of 
these characters is essential. Table I shows the use of 
characters found for various types of documents. This data 
is based on sampling rerformed expressly for this study. The 
following sources were in the sampling process to construct 
Table I; 


1. newspaper - Korean Daily Times, "3A / WSEat, 16 
September 1984 

2. jeurnal - "National Security", June 1984 

3. technical papers {A) = “COBOL Programm: ng“ vena. 
publishing Co., 1978 : 

4. technical papers (B) - "Introduction to Law", Beob 


Mcon Sa publisiing sono? s 
5. business papers - Korean Air Lines Co. 
Although the sample was taken from a single source for each 
kind of document, it is the authors! view that the dccuments 


selected are representative of the entire population of each 


type. 


B. CHARACTERISTICS OF KOREAN SCRIPT 


The native Korean alphabet was introduced in 1446, after 
centuries of the use of a more cumbersome method (known as 


IDU) to transcribe Korean with Chinese characters. The set 


10 


a nn no Foes a 


TABLE I | 
| Proportions of Written Characters | 
| | News- Journal Wee Se. t. Ss Business 
ue creret et tees Agel Ba | Paper. | 
| Roman | | | | | | 
lee script {| 1% | 3% | 40% | OF | 19% | 
| Korean | | | | | | 
ee Script | 84% | 7 6% fe). pee 5 ST | 80% 
| Chinese | | 
| character 5% 21% 5% { 45% | 10% | 
| = te: Technical papers from western countries | 
| * (B): Traditional and historical papers | 
eee Ue Oe 
of 28 letters! (now 24 letters) was designed by a group of 
scholars commissioned by King Sejong (1419 - 1450), the 


fourth King of the Yi dynasty. 

The Korean language and alphabet is spoken and written 
by an estimated 50 million people on the Korean feninsula 
and its coastal islands. Many among the approximately one 
millicn Koreans residing in Japan, China, and America still 
speak and write the language [ Ref. 9]. 

The Korean alphabet currently used consists of 14 conso- 
- # Ow A DOD AAAE KH B) and 10 


toes fel tL 
Vowetse (te ok a - Ab +r Tf — _]).- There are also 17 


compound consonants (17 MK LA Lé EE 27 20 pu 2A Z2E BE es OF 
did dA AK 2%) and 11 compound vowels (H H Al al 2h +H 4t 
Al TL Ele The letters of the Kcrean alphabet cannot be used 
independently but are used to build syllables. Each Korean 


character consists of two or three parts. The first part 


—_— a a ee SS ee a see ee 


tA letter 1s an element of a character. The character 
consists of two or three letters. Letters in Korea are a set 
of 14 consonants and 10 vowels. 


17 


must be a consonant or compound consonant. There are 19 
jetters that are possible for the first part of the Kcrean 
character. They are typically consonants or Compound conso- 
nants (7 Jak Efe e2ow Hw AMO AR AA EL 
one The second part of the Korea character is typically a 
vowel or compound vowel.There are 21 possible letters for 
the second part of the Korean character (Fk H F HAH aA Al 
dada tap sLra tra awaas- Al). Mhe thira 
part of the Korean character 1S optional and depends on the 
character being depicted. The third part if present, must 
ke aoconsonant or aocompound consonant. There are 23 
letters possible as the third part (7 27 ML LOE WA cc 2 


21 2D 2d PA cE 2s OD OF YH WA KH OAR A CES 
pl). This section has been summarized in Figure 2.1. 

The Korean system of writing is called “"Hangul". It is 
"Mohon etic awe e eng, like English, in the sense that the 
symbcls represent sounds, that is, consonants and vowels. 
Unlike English symbols, which are grouped directiy into 
words (e.g., E+tntgtl+itst+th = English), korean symbols are 
first greuped by syllable (e.g., Htatn gtuti™= )2anvgoe 
[Ref. 10]. 

Korean symEols are written in syllabic groupings. An 
enumeration method? is to put letters side by side as in 
"LONDCN". But the Korean language stacks the letters in most 


characters. For example, "LONDON" @woul® be Gideren oS Nae 


The simplest syllable is written with one consonant and one 
vowel. When one writes the symbol for a vowel alone, one 
must add the consonant Ssymbor "Cm which indicates an 
initial mute (which is closed as a consonant). In this 
Simple consonant and vowel syllable, there are two types of 


arrangements; side-by-side arrangement (e.g., 7k) and 


_., “In an_ enumeraticn method letters are ovlaced side b 
STecree element by element uSing a set of consonants an 
vowels. 


2) 


on 


» THE CHARACTERISTICS OF KOREAN CHARACTER 


THE KGREAN ALPHABET CONSISTS OF 24 BASIC LETTERS( ELEMENTS); 
44 CONSONANTS: TL OCA@2OwAHXRIETDS 
1@ VOWELS ;- rt Fd aLuTrTaneit 


- EACH CONSONANT AND VOWEL CAN BE COMPOUNDED 


- POSSIBLE COMPOUND CONSONANTS 
VIVA LX LS CO 2) 2D ets eA ee 2 cs OF GH OBA AA XK 


POSSIBLE COMPOUND VOWELS 


MH Fl dioaf!ode dH dtloT4 TH Tl  -!I 


» EACH CHARACTER CAN BE DIVIDED INTO THREE PARTS (« FIRST 


SOUND, MIDDLE SOUND, FINAL SOUND) OR TWO PARTS (FIRST AND 
SECOND SQUND ). 
» THE FIRST PART MUST CONSIST OF A CONSONANT GR A COMPOUND CONSONANT 
THE SECOND PART MUST CONSIST OF A SINGLE OR A COMPOUND VOWEL 
THE THIRD PART 15 OPTIONAL. IF USED, IT MUST BE A CONSONANT. 
» THE FOLLOWING LETTERS CAN BE USED AS THE FIRST PART; 
.,ov) tL C fC 2 O B Be A AA H KR KK RhUFUE 2 CS 
Zo PETTERS 


- THE FOLLOWING LETTERS CAN BE USED AS THE SECOND PART; 


FHE FL 48 49) 2 dF tH LIU TTA TH TH AL LI tt 


21 EET ehe 


» THE FOLLOWING LETTERS CAN BE USED AS THE THIRD PART; 


VV UML LA Le C 2 2) 20 oe OA PE AN ASB ODOI KAMA AAH Kk F 
c¢ 0 s ; 28 LETTERS 

NUMBER OF POSSIBLE COMBINATIONS OF CHARACTER = 19"23"29 = 11,571 
IN PRACTICE, ONLY ABOUT 2, 400 CHARACTERS ARE USED. 


Figure 2.1 The Korean Alphabet. 
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top-to-bottom arrangement (e.g., 1). The particular vowel 
being written determines which arrangement is used. 

Representing these character syllables through a 
computer creates a probiem because each letter's (consonant 
and vowel) shape can be different due to a requirement that 
each character be balanced, i.e., have the same size and 
achieve a desired asthetic quality. For example, when _7 is 
placed to the left cf a vowel, the downward portion- is 
slanted: ay fe. d.-, zk). when it is placed on tor of the 
vowel, the downward portion becomes straight: Jt. (evga 
1). As shown above, it is very difficult to apply these 
different shapes for a particular letter to a line printer 
and a typewriter. This problem will be discussed in detail 
in the fcllowing chapter. 

By mathematical calculation, the possible number of 
Korean characters is 11,571 (19 * 21 * 29). It must be 
noted through that ecnly 2,369 characters are commonly used 


[Reis 63. Dowell. 


C. CHARACTERISTICS OF SINO-KOREAN CHARACTERS 


Sino-Korean characters are Chinese characters used in 
Korea. They are different from those used in China. Koreans 
refer to Chinese characters as Hanja. Chinese characters 
have a leng history, the earliest discovered writings having 
been dated from about 14 B.C... in 10974.) dU ri nGgemem mann 
Dynasty, this was modified by Hsu Sheng (Eo ha, 50e- 9 125 0 an 
his 1& - Volume paleographical work, Shuo-wen Chieh-tzu, 
(38 X ASF) which translates to the explanation of writing 
and analysis of words. That work lists 9,353 characters 
under 540 radical entries. Of this number, 364 are picto- 
graphic, 125 simple idiographic, 1,167 compound idiographic 


and 7,697 phonetic ccmpounds. 
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The most complete collection, the Kang Hsi Dictionarv 
witn about 50,000 characters was published in 1716. Since 
1949, after the establishment of the Peoples Republic of 
pind , the Chinese government actively pursued language 
reform until the Cultural Revolution, 1966-1976. The Chinese 
government changed and Simplified the characters from the 
set gjinal fRef. 5: ps. 15 ]- 

The number of characters used commonly is from 1,009 to 
e700) . Tablcesit Reto i: p. 819] shows the frequency of 











| 
TABLE II 
| Frequency of Chinese Characters Used in Documents | 
iin inne en eae sea Se OO nono | 
News- |General J|Total News- |General { 
MMe ta) papers ene sie | 
(%) (%) (%) (chrs) | (chrs) | 
| IS ty wOveGhrst 10.007 Geo Ht] 80 | 499 | 638 | 
| 50 Vl ecs Daa 2 85 lj 6 > 777 | 
ee 3059 SiO. a0 Sel oe | 

290 55.4 la 0 oS 1LOIG foos 
| 500 er Ome oe | I 96 aa ais 1479S j 
1000 ooo ae too. 0 | oe Zoe | 1617 | 

1500 97.4 | oe O a6 1421 Ao 2 
2000 oe. on aro og WG Gale ef 215 7 
Zoi00 pero | 99-4 ii 1,00 ieee s seleet so 2d | 
| 3000 | 99.8 | | | | 


* Chrs;s acronym of characters 


ey = 


Chinese characters used in typical documents. 

In 1972, the Korean ministry of Education suggested that 
1,809 Chinese characters be learned and used for educational 
purposes [Ref. 3]. In this study, the authors will restrict 
tnemselves to that set of 1,800 characters. The Chinese 
characters are called Hantzu in Chinese, Hanja in Korean, 


and Kanji in Japanese. All mean "Han Characters" Pane 


i 


These characters are used exclusively in Chinese writings, 


and in combination with the Hangul (Koltean))§abarane aan 


Korea and with the Kana Svllabaries in Japan. The 
Sino-Korean (Hanja), in written forn, 1s a combination of 
three major elements: pictogranms and ideograns, and 


phonograms [| Ref. 5S: 7p. e2232 

In the next chapter the perspective of a picture for 
each character will ke used because of both the complexity 
of Chinese chracters and the ease of representation in the 
computer. Each Chinese character has the meaning ard sound, 
for example, K means heaven and the sound is cheon. Also, 
there are many characters which have different meanings kut 
the same sound, or the same meaning but different sounds. 
In order to solve this problem there are several methods. 
Appendix A [Ref. 5: fp. 17) shows the evolution of Chinese 


characters. 
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mie KODLENS OF FPDITING KOREAN AND CHINESE SCRIPTS 


A. CURRENT EDITING TECHNOLOGY 


The current word processing practice in “orea is to type 
Korean characters by the enumeration method, that is, input 
letters (8 bit code: Consonant and vowel in seguence) and 
output these letters as a character syllable using a Forean 
character conversion rrogram for Korean script. Appendix 8 
shows the EPCDIC input codes currently used by FACOM, and 
Appendix C depicts MDS (Mahawk Data Sciences) input codes 
used by IBM. To type Chinese characters the follcwing 
sequence is followed: 

1. Depressing a Chinese character function key. 

2. Typing the scund character of a Chinese character 
using the enumeration method. 

3. Displaying all homonym (from 1 to 6)) Characters 
fRef. 4: p. 34] that have the same sound. 

4. Selecting one character by using an index number, and 
entering the character to a buffer or file. 

Machines dealing with Korean language data are currently 
available from the IBM and FACOM corporations in Xcrea; 
ios Mit rstatrone 55505 (1984) and FACOM OS IV(KEF) (1982) 
are newly updated and well developed machines. These 
machines still have several disadvantages in handling Korean 
and Chinese characters: 

1. aA large amount of time is spent in character conver- 
sion. 

2. It is difficult to directly delete and insert records 
in a file. 

3. The word processing editor cannot recognize tne char- 


acters being edited before executing a character 


oe, 


conversion prcgram since only the enumeracvegeile ae 
can Le displayed. 

4. The method of entering characters is inconvenient and 
recuires a tremendous amount of effort for Chinese 
characters. 

5. One cannot ccrvert all Korean character syllables 
into Chinese characters because there is not a ore to 
one mapping. - 

6. Data communication is impossible since there are no 
Standard codes for Korean and Chinese characters. 

Appendices D and E show the keyboard of IBM Multistation 
5550 [Ref. 8: p. 14] and FACGM OS IV(KEF) [ Rec. #2 se o-umeaom 


respectively. 


B. USER REGCUTREMENGs 


Most potential users have recognized that the computer 
1s essential in data processing and office automation. 
However, because of the above constraints, they are unsatis- 
factory for use with the Korean language. Some general user 
requirements of computer researchers and manufacturers are 

. the follcwing: 

1. Users want to uSe Korean language commands and 
programs but there are no Korean language criented 
Operating systems or programming languages such as 
COROL, FORTRAN, Pascal, etc. 

2. Users want to edit three kinds of characters simulta- 
heously and ina user friendly manner. 

3. Users want to display and print out data without 
using a conversion program, as is done with the 
Korean alphabet because of time, memory space, and 
inconvenience. 

4. Users want to use interactive files and database 


processing. 
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In summation, they want to use computers that handle three 
kinds of script in the same manner in which fresent 


computers do with the Roman alfhabet. 


Gee nOperos OF REPRESENTATION OF THE THREE KINDS OF SCRIPTS 


Because of the characteristics of Korean and Chinese 
characters, the following problems occur: 

1. Hew can one enter 2,400 Korean characters and 1,809 
Chinese characters into a computer through a limite? 
numrter of keystrokes. 

2. How can one develop the system program to direct 
PPuteandeOULLTNE WLENOUt USING a COnverSion progran. 

3. How can the asthetic quality of display and output be 
improved. 

4. How can one increase the processing sveed and reduce 
the memory space for these character definitions. 

There are other froblems but the above problems are the 
most significant. Amcng these problems the first one is the 
most serious and Significant problem, and consequently, the 


authors wili give it more attention in this study. 
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PROCESSING 


IV. POSSIBLE METHODS FOR KOREAN LANGUAGE DATA 
In order to solve the problems which were mentioned in 
the previous chapter, the following methods are offered as 


possitle alternatives for Korean language data processinga. 


A. 8-BIT CODE FCR KOREAN ALPHABET 


Since the Korean alphabet consists of only 24 letters 
and Korean language data can be expressed using only Korean 
characters without a serious problem. The enumeration 
method, like the Reman alphabet, is the easiest way to 
represent Korean characters without changing the hardvare 
and the operating system. This method is not highly readable 
and would require changes in the language which may not be 


acceptabie to users. 
1. Using th 


A program can be loaded which defines the 24 letter 
Koreah alphabet to acharacter generator instead of the 
lower case Roman alphabet. All Korean alphabet elements and 
the upper case Roman alphabet characters are then available 
through the standard Roman character keyboard. Fith this 
Method the user can use a computer in a Similar manner as 
the users who use tke Roman alphabet. In addition, well 
developed hardware and software can be used without critical 
problems. This method has been suggested by many grouvcs of 
people from the time when the Korean typewriter was first 
developed. The only disadvantage is the breaking cf tradi- 
tional custom. To caritalize on developed technology and for 
the ease of application, more study and research should 


center on user acceptability of the enumeration method. 


ZG 


Figure 4.1 shows an example of hard copy which uses a 
graphic dot printer and a standard keyboard. Appendix F 


shows the load command proyram for an alternative character 


— es ies Seer ee ee ee 


ee EE a en SE SE ee Se: en ce > ae 


JL 6FO OL} YW1DO 
LF O.f AFA CH DL SFO OL MLA OI LL Arh Jie OT2 JA tiv2 WA 


Rin CH Wid AP OF ABIL CLO LA TL ALT O8 Ad Ara CH OW FL aly 


O2g t! OF. 
| 
| 
—— = i a a a a ae ee ee re 
Figure 4.1 Exaaple Using Standard Kevboard. 
generator for the Korean alphabet. This program can be 


generated easily by the alternative character set editcr, 
and it loads the Korean alphabet to an alternative character 


generator instead of the lower case of Roman alphabet. 


P eeJsindmtme Capital betters as the Initial Letter 


= —- = ——-— _— ee ee oe —— ee ee ee — = =| —S = ete —_ <= oe oe oe oe 


The major difficulty with the enumeration methced is 
Foor readakility. Kcrean users read a sentence sequentially 
Syllable by syliable. In order to increase readability, the 
initial letter of each character can he written aS an upper 
So Orme EMComaistingiton the Syllable casily. Figure 4.2 
shows the example using the capital letters and Appendix G 
represents the load command progran for these letters. 4 
Special mark or altered shape of each letter also can be 
applied to increase a readability when an enumeration aethod 


TSees cad . 


LEQU Arolde ThokoO.4 Maabibi Ac h2 Qrala Hrat-a 


TWhéko0s Hic | 
Naae a AOeie Cacka L Away Lialic Tub Tety 
{ 


Figure 4.2 Example Using Capital Letters. 


B.  36-BIT CODE FOR THE THREE KINDS VOr ser cea 


There are various methods one can use to enter Korean 
and Chinese characters, but the 76-bit cod@ 2s “one oa yrine 
tetter methods, since it can identify all possible Korean 
and Chinese characters without uSing the enuneration method 
and a conversion program. The structure otf this code will be 


discussed briefly in the followiny subsection. 


As mentioned tLrefore, a Korean Character sylilatle 
consists of three parts: 

1. Yirst sound; cne of 19 simple or douhle consonants. 
2- Second sound; cne o£ 21 simple or compound vowels. 
3. Third sound; cne of 28 Sirple or compound conssnants 

(optional). 

Since the number of each first, second, and third 
letters is less than 22 letters, 5 bits are enough to ilen- 
tify each sound. All possible Korean characters can te iden- 
tified using 15 bits. The TS Bit Of Wlemr teases ies 
indicate a Korean character (by a 0). @Neemext 95) cut cecm: 
used for the first sound, the fol Vowonb See rotsce: cmd 
second sound, and the final 5 bits fore the thir ieccm oe 


Table III shows the structure of 16-bit code for the Kerean 


Ze 


Character and Table IV explains the 16-bit code for Xcrean 
character. This code table is basically the same as the IB”™ 
Zeeyoc tHeeEnal Korean character code [ Ref. 6: p. 52]. The 
only difference is the arrangement. Some IBM codes represent 
three letters. This makes key tops (face of each key) more 
complex; for example, 00190 (one key top) represents tL, H, 
and TA values. The code suggested in Table IV reduces scne 
of this complexity Ey limiting the possible values to no 
more than two for each keytop. In contrast to the example 


for IPM codes, the same code from Table IV represents only 


one value. Appendix H represents the IBM 2-byte internal 
TABLE Lit | 
Structure of 16-bit Code for Korean Script ! 
| 
| eam > Eee | See Zima = —————— > | 
i ee | 
ee eerie te | tf - | | f Jf bed | 
oe eee eee! eet re toe | | 
— | 
Sm omen koe ona SOund--xk---3rd sound-—-> | 

Sepa is bits | bits 


Hangul code for the Korean character. 

The suggested code has several advantages. First, it 
is easy to sort the character order by its value since the 
value of each letter is in the order of the Korean alphabet. 
Second, it can reduce the memory space for data bv using 2 
bytes instead of 3 bytes for one character. yee ues 
possible to edit the character directly since it does not 


need code conversicn. Fama, Since it can easily 
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recognize the code value of the Korean Character, it helps a 


programmer when it is programmed. 


TABLE IV 
16-bit Code for Korean Script 


—_—_ = oP «ee oe ee ee es ee ee ee eee ee eee ee eee eee eee ee eee eee eee eee eee eee ee ee eee 




















| L5 bit lst | 2nd | 3rd. |5 bit|” Wet We 2nd See, 
See eo eee eee eee eee tac eas sae 
| |ooooo; a a ljoo00l ; ay | es 
|} joooo) a | 30001 a a 
| |ooorol 99 |e a7 ie%0| 7 7| TE 21 | 
|} 100011] | H | 7A 410011] | wv | 
| 90100 ce nleeeeds tel ror00] "ae fe 
1 501011 sae" Seq lao ae cee 
| 00110). ernest | |e 
) poop Eff fata A | AK 
Ferre: EE | “4 | 1711000 oO O | 
| oe canal oan 
} }01010F } alt 27 ator; ae tht 
} pororyy a eo atone 
| P97100] a lat ee "|77700] | 4 CO | el 
1 alsenoal ve aaa een ee 
| 01990) | oe | ge (19970 ee 
p Lo4s 89 ee ee 
| * Blank: Not used 

| ee 
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a re ge cy mg Yc) tr am eg ce Om ee ee ee 


Ox 16-bit Code for Chinese Characters 


There is no limitation in the number of usakle 
Chinese characters, but statistics show that 1,800-3,000 
Seiagacters Cover 98-99.8 percent of those which appear in 
hewSpapers and journals (Table ITI). Currently there are 
only two ways to represent Chinese characters in Korea. One 
method 1s comprised cf two steps. mae first step is to 
Cispiay all Chinese characters (synonym) which have the same 
sounc atter entering the desired sound, and the second step 
is to enter the Chinese character which is needed by the 
user via an index number matched to that character after 
selecting it in the display. The other method is to convert 
a Korean character to a cChineSe character after tyfinga 
Korean character as a unit of a word, which consists of two 
or tkree characters. 

The former is inconvenient and takes a long time to 
Catt. bhewlattenenasene Elexibality in that at is limited by 
the programmed word ccdes. To solve the above prcecblem and 
simplify the identification of each character using a 
limited number of keystrokes, qulo-piLt Meode Lome Chihese 
characters can be aprlied. Table V represents the structure 
of 16-bit code for Chinese characters. 

Chinese characters represent both meaning and 
phonetics to Koreans. To simplify the code, all the 
compiete meaning and sound of the Chinese characters are not 
needed. The Chinese characters are composed of from one to 
five syllables for meaning and one character for the sound. 
Simplicity can be achieved by employinj abbreviations of 
acronyms for each part (meaning and sound). For example, a 
Chinese character (X_) has a meaning as "Hea-Ven" and sound 


as cheon. In this case we use 4 of Hea, V of Ven, andcC of 


—_— aw «ee os ow 


cheon as a code for (&). 


Ze 





- 
TABLE ¥ 
| 


| Of 

Po ase weaning | ona meaning 3rd meaning 
| character character character 
Xk 


1: Chinese character 


(ee eS 


But this method may result in duplicate codes for 
different Chinese characters which mean another character 
and may have the same value as HVC. In order to eliminate 
the duplicate code and to use the 3 letter cote which is 
compatitle with the 16-bit Korean character code, the 
following characteristics of the sound and meaning of 
Chinese characters are relevant: First, only 428 syllables 
are used to represent the sounds for all Chinese characters. 
That is, one sound can represent 1 to 60 Chinese characters. 
Second, the frequency of Korean characters used for the 
meaning and sound is irregular in distribution. More 
specific, 20% of Korean characters are used to represent the 
sound and meaning of 95% of Chinese characters [Ref. 11]. 

AS a result of analyzing the 1,200 sound characters 
and 1,438 meaning characters used to represent the Chinese 
characters, Table VI and Table VIII are derived. Table VI 
represents the number of Chinese characters which have the 
same first sound letter and the same second sound letter. 
For examrle, 266 Chinese characters have the first sound 


letter (_]_), 44 Chinese characters have ( as a first 


puis) 
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sounc letter and (___) as a 2nd sound letter 3 2-3. os 
range the sound acronym to the 5 bit coce since Ehe distri-= 
bution of sound characters iS inregqulaes) Tlacteos @eadecreicee 
the rearranged code value for acronym of the Chinese sound 


character. 








os a 
L TABLE WII | 
| 5-bit Code for the Acronym of Sound Character | 

| code | sound acronym | code | sound acronym | | 

"50000" __ wba 2t_ |. 8000 
ee es eer a | OO UR aint | 
} | 90070 | p1001 Fe | 
ee O00 | + f 10011 | at | | 
| }90100 | a toto | 
| BOTT cae | 1010 1 
| <a aaa 6 ae | ~ 10010 | I 
pf ont | Le | 10) | | 
| | 01000 | ded | 11000 | I | 

01001 oO 11001 | F 

| 7 eee | | a Sonos ait a a. | | 
| os 4 yo | aao11 | xO - 

“011001)) Se ad ee a - 
| | o101} a Wor} | 
} fj onnt0 {aa 
| { 01111 | + ft o i a 


_- 


ee || 


In Table VII the second lletter | (i meme ay maine 


describes all the group letters. For example, (Gleam 


Zs 


Bopeecents 6) oN =, Wo, and (4): 4, -l, 4 


—=! 


wee) eee, el | SC etc. «6 Also (22) 


ee 


represents -, -~«4_ group which assembles the first ccnso- 
NMants at the left of the vowel. as describes +, Te, 
and _~ group which assembles the first consonants above the 
vowel. 

Since the freguencies of Korean character syllatles 
representing Chinese character's sound and meaning are 
different, the frequency of Korean characters to represent 
Chinese characters meaning is needed to be analyzed. Arter 
analyzing the sampled 1,438 characters which are the first 
and the second meaning characters, Table ViII is derived 
which shcws the number of times for a meaning character ora 
group to be used. 

The meaning acronym value to a 5-bit code from the 
basis of Table VIII can be reassigned. Table IX shows the 
reassigned 5-bit codes representing the acronym of the 
meaning character. The same theory can be applied as in 
Table VII when Table IX is derived. As the acronym code is 
rearranged, the proportion of the duplicate codes can be 
reduced. As a result of applying these rearranged codes, 
Table X¥ can be produced which shows the proportion of the 
duplicate codes. The pure acronym code (Table IX) repre- 
sents the acronym of a meaning anda sound character asa 
first letter code of Korean characters (7 7 LL C CO €£ U 
Huw AM OA MAA £ HSB; 19 possible censo- 
nants), the arranged sound character acronym code (Table 
VII), and the arranged sound and meaning character acronym 
Code “(flabice Vii). 

The reasons why some duplicate codes cannot be elin- 
ipacedvare: First, some Chinese characters have similar 
meaning and sound which generates the same acronym code (22 
anong 1,800); and second, there are initially some Chinese 


characters which have the same meaning sound (12 among 
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TAB TE Bev it | 
Frequency of Meaning Character in Chinese Characters 


2nd letter 
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[ | 
| TABLE IX | 
| 5-bit Code for Acronym of Meaning Character | 
0 ES | 

| : | Ist &€ 2nd | Ist & 2nd | 
ee ee | cote (i csesting acronya | | 
|) 2 ae TS a a. a 
| | 00001 | 7 | 19001 = 4 
| oe eee ae | 
| 00011 | al { 10011 | o| | a 
| | Go100 | ean | 110] 2 | | 
| TU den —_— et Ros | | 
| ar ieee O46 | ee 8 1 | 
} j90ntg | tora |e 
| | 01000 | ce | OOO) 4 7 | | 
| | o1001|. oT | iioo1; x I | 
) [onto J . 49019 { oe oe | 
moto; 2 | aioit | a. ae i 
| | cii00 | . \"Wii00 | a | | 
| | oii}; a a) h|ULhr i 
aa eo ee a | | 
a A a 
es r—s | 
oe eS SEE EE EE J 
ler O )-. For these characters which have duplicate codes 


the users must apply the exception rule. Alternatively, the 
meaning cf the character can be redefined to a synonym with 
a different acronyn. For example, if the wrong horonymous 
Chinese character is displayed, the input operator may 
select another form of the homonym by keying in the full 


soumad syllables ianst¢ad of the acronyn. 
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TABLE X | 
Proportion of Duplicate Code | 


Number of {| pure | Rearranged | Rearranged sound 
Character | acronym sound cha- and Deane i 
j code : racter code| character code | 
eae Sao feeerrccs | See 99" " "55 ae 
1 3i0¢ in ee | Pipe | 1.8% | 


3. 16-bit Code for Roman Alphabet and Symbols 


=== —— 


In order to use the three mixed kinds of a character 
code, and simplify the I/O controller, and unify the word, 
16-bit codes for the Roman alphabet and symbols should be 
generated by only one keystroke. For data communication and 
for familiarity, adding only the default byte (00H) to ASCITZ 
code, 16-bit code for Roman alphabet, symbols, and controi 
characters can be defined. When one uses only Roman alphanu- 
meric characters, one can easily convert this 16-bit ccde to 
ASCII code. Table XI shows the 16-bit code for Foman 
alphabet and symbols. 


4. Keyboard for 16 


As it is mentioned in the previous chapter, the 
rkiggest issue is how to enter ail Chinese characters, Korean 
Characters, and the Roman alphabets with a simple keyboard. 
In order to implement the 16-bit code to keyboard, one would 
have to make the keystrokes which generates "1" or "0" as a 
Chinese character function key (bit 1), three 5-bit codes 
(00000-11111) for Chinese and Korean characters (bits 2-16) 
and 16-bit code for Roman alphabet and symbols. In this case 


33 more keys than the common Roman alphabet keykoard are 
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TABLE XI 
Structure of 16-bit Code for Roman Alphabet 
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needed. To reduce tke number of keys, one more function key 
can Fre added for the Roman alphabet which generates 16-bit 
code as a 5-bit code key. However, the user identification 
will be complex because there must be 4 or 5 letters on each 
key top. Table XII explains the 4 alternatives. Alternative 
I includes 32 Roman alphabet on 5-bit key tops using a Roman 
alphabet function key and Alternative II excludes Poman 
alphabet on 5-bit key top. Alternative A uses the acronym of 
sound and meaning characters and Alternative B uses only the 
acronym of sound character for the sound and meaning 
Characters. 

To select tke best one, the authors can use the 
following criteria: flexibility of hardware and software 
design, hardware efficiency, ease of maintenance, system 
reliabilty, user characteristics, number of keystrokes, 
number of duplicate ccdes, and complexity of recognizing a 
certain keystroxe [Ref. 2]. fiche oolnilon OL authors’ 
alternative II-B should be selected for the ease of exfplana- 
tion and understanding. By selecting Alternative II-B, a 
user can type the three characters simultaneously. Ts 
example, to type "School is nor ne (Hak-kyo) in Korean and 


& ke (Hak-kyo) in Chinese character", one can type directly 
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TABLE 2err 
Leveled Letters on 32 Key Tops 





| | 
| 
| | 
: | 
| 
Alternative I Alternative if | 
| Loa E-. ils i=. | 
| +—-------- + +-------- + t—-------- + + ---------| 
| 4 | 7 | [2°13 4 fe 13 14 | 
| |2 13 48 ze] hd ee ; 
oo | | =-4--4-- 1516 s || 
| > als | [5 | | | | || 
| +—-------- + + —------- t—-------- + + --------- | 
| Legend 1 Roman alphabet 
2 First letter of Korean character 
| 3 second letter of Korean Character 
| a Third letter of Korean character 
| 5 Acrcnym of Chinese sound character 
: 6 Acronym of Chinese meaning character 


UR LL me ee a a a NE ES 


only Rcman alphabets in the previous sentence without a 
function key. Two syllables of "school" are formed from: In 
Korean, the first syllable is "so" seiected from the fosi- 
tion of first sound letter, "*" from the second sound fosi- 
tion, "1" from the third position. The seconi syllable is 
weil, it, Default". Then, in Chinese, press the Chinese char- 
acter function key which generates "1" as the first Fit. The 
first syllable is nbn which is the acronym of first meaning 
character and "#" which is the acronym of second meaning 
character and Woon which is the acronym of sound character. 
The second syllable is "oF ia i", after typing Chinese 
characters, user must release the function key to type 
Korean characters. Table XIII explains the above example. 

AS aresult of the above example, the computer 
generates the following codes in hexadecimal: CS 3 (Sy 
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° 4 | 2 >) 
the meaing is nr UH ae atemencmcound 75 "7," and 
fou Se the meaning is "*t Zu and the sound 


is "ale, 


Kl ltsi‘(i‘S™S — | 
| TABLE XIII | 
Typing Procedures for Mixed Characters | 

| 

| Te type "schocl, sf i, S AR", the following | 
| 

| procedures should be follwed: | 
| | 
| 1. Type "schocl" by one keystroke for each character} 
| WiciO@trd. Punetiem, Key. | 
| oa | 
2. in Korean, to type "% J, | 

| firs tee oO cer, “TN, | 
| Second, tyre " ', ==, Default". | 
| | 
3. In Chinese, to tvpe ner Aen, at first, press | 
; | 

| EHemtiMettenekeyeethen type "He, EF) set, | 
| noe GL, i" (Table IX) since for nee | 
| 
| | 
: | 
| 

| 

| | 
i | 
J 
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(tee OCouthpmooort(o),.) OCOGE(o), O06C (1), and O (Korean 
character), 11111(°), 00010('), 00901(7); that is 0111 
1100 0100 0001( 7c 41: °% ), 0 (Korean character), 00001(7), 
10010 (44), 00000(Default); that is 0000 0119 0100 C000( 06 
ie eed tei@hinese characterjeeoio10(4:), 10000(7), 
11110(3-); that is 1010 1010 1001 1110( AA 9E:%%), and 
1(Chinese character), 11110(e-), 00010(i1), 11110(2); that 
is 1111 1000 0101 1110( F8 5E:4%). 


>. Cperating System for Input and Output 


To apply the suggested system, it 1s needed to rede- 


Sign the cperating system for input and output control. 
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Figure 4.3 Flowchart of Input and Output Controller. 
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PomEcet os ShowS the flowehart of input and output contrcl. 
farst, the input and output controller has to distinguish 
whether the Chinese character function key is "0" or "1", &A 
flag register can be used for Chinese character function 
key. For example, if the flag is "1", then "1" is loaded to 
the first of the 16 bit register, and multiple 5-bit codes 
ame read until the register is full. If it is full, a char- 
acter is displaved on the CRT, and the 16-bit code is sent 
to a kuffer as data. Gretwise= the flag 1s "0", and then 
"oO" is Ilcaded into the register, and 5-bit codes are read 
for a Korean character. A 16-bit code is used for a Roman 
“eo naeet Neharacter er a symbol code until the 16-§Fit 
register is full. If the 16-bit register is full, the iden- 
tified character is displayed and sent to a buffer as data. 
Perec "Stop e@dite"’  *ccnditron showa in Fag 4.3 is "no", the 
input and output controller makes a loop to read a code, 
displays a character and sends a character code to a buffer. 

This system will make the use of Korean language 
commands and programs easier to use than those presently 
available. To achieve the’above goals, a compiler and inter- 
preter, as well as the operating system will require rede- 
Signing. This system will require the complete rewriting of 
all software currently used. The economic impact of this on 


the Korean people will be enormous. 


6. Design Considerations for Character Generation 
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There are twe shapes of characters used in Korea: 
Gothic (Figure 4.4) and Brush type (i.e., Ming style: Figure 
4.5) [kef. 7: p. 34]. To generate the above shapes of char- 
acters, several methods of a character generation can be 
considered. To select the best method for Korean and Chinese 
characters, one can use the following five criteria: speed, 
Space, edudtmey,  tlexibilty, and cost. Speed is a dcuble 


Standard: speed of creation may range from a few minutes to 
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Figure 4.4 An Example of Gothic Type. 
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Figure 4.5 An Example of Brush Type. 


a few hours, while speed of production skould go Leyenid 1009 
Ccharacters/sec depending on type size and device rescluticn. 
Space refers to the averaye Size Of the Code for gencma a — 
acter as well as tke size of the internal buffers cfter 
needed for decoding. Quality is proportional to the lazyest 
dot matrix which can he used to decode a character; qt 
should net he confused with the resolution of the output 
device. For a given type size, the resolution sets the defi- 
nition, that is the size of the matrix to be used; defini- 
tion, hence type size, is hounded by the quality c£ the 
code. Flexihbilty refers to the different automatic modifica- 
tions which are supported by the code: scaling, rotary 
family variations (as going trom ~laiperto voor Cost is 


selfexplanatory [Ret sl2: 3p. eeu0e. 


38 


Cbviously the five criteria above are not indepen- 
fee es lgure '.6 shows the anterrelationship of criteria in 
Gesigning a character generator [Ref. 12: p. 241]. The most 
desirable feature is indicated by the direction of the 


arrow. Solid (resp. dotted) iines indicate ajreenert (resp. 





Figure 4.6 Criteria in Designing a Character Generator. 


contrariety) between the variation of the factors. The 
Gesleneor ca agitate Cndracter Generator 1S afk engineer's 
task whose goal is to strike the appropriate lalance between 
the specifications for those five criteria combined with the 
characteristics of the production device, resolution and 
scanning, and the necessity of operatiny the corresponding 
creation station. 

Table XIV (Ref. 12: p. 268] gives a summary of the 
Main Characteristics of the coding methods that the engineer 
CAnmeeriic@emrRers t25 p- 269]. Rome wenaractecrist 1cS Ob 
Korean and Chinese characters are compared to Table XIV, it 


should Ere apparent that the bit map metho} is the most 
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TABLE XIV 


Comparative Table for Performance 


























| | 
| | 
| 
! 
Buffer on Peso- | 
| Code space space |Flex- |]Video|[lution | 
pers (bits) (bits) ee eee | (n) ee eee | 
Pm ter pe 
| Bate 2 | 
pesca Bate | a2 =| ---2--| -2 2 | eee 
een k*¥n*log n | | z } 4 
j iength) 72 }___3___| et te 
Chain-| 2 wa | La 
| Link Cake. 2 n | _ | = eS ee eee ! 
Dine — | | | | | 
renti= Gk ni | | | 
a1 Fine vont Gy n _, re < 100 | aE 
lenge 2. | -— 1-2 |--e0| oe 
| . | k*¥log n jk *log n| | | {| 1 
| Pees n 2 | noo _| oe _.. tia joestocs! | 
2 | | | | | 
Struc-|)k-2Logen | n | | | | | 
| tural | | 
* Gty: Omelet 
| < Scd: Speed 3 | 
| Legend: : eoc8 | 
= 5 abe 
b: The numer of birth point in the character | 
| es pe COUSt ent Tey care of bookkeeping 
n: The size of matrix 
| k:; The average number of runs _per matrix. . | 
line cr column (a number of the simplicity 
of the character shapes: aD ere k nace | 
for Ecman body-text fonts, higher than 
| 10 for Chinese characters | 
| 
| | 
| De ee i ee ee 


appropriate one for this application. It reduces code space 
and buffer space. It has good video scan, high speed, and 
highly readable low guality printing. Unfortunately this 
method lacks flexibility. However for all the other afore- 
mentioned reasons, the bit map method is commonly used for 


Korean and Chinese characters. 
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Fecause of the complexity of Korean and Chinese 
Seteactelo ne dt tcast d 16 Dy 16 resolution 1s required for 
Korean characters, and a 24 by 24 resolution for Chinese 
PitmiciCrS ees ves c,oye DY 64780 by 80, 96 by 96, and 128 
by 128 resoluticns are deSirable when much more beauty is 
required and also when larger character sizes are to be 
produced. However, if these characters are displayed ona 
CRT, with 32 by 32 resolution, with the size of each char- 
acter 7-10 mm Square, this should be sufficient. 

it 1s the authors' opinion that the less expensive 
32 by 32 resolution CRT should be used for softcory. The 
reason fcr this 1s that the price of the memory compenent 
required to hold the character definitions is continually 
getting less expensive. However stronger motivation is that 
high-speed and flexibility of typing is then possible 
PxCie 1. Pp. O29 ]- TBM corporation uses 16 by 16 resolution 
for Gothic and 24 by 24 for Brush type Korean character 
Syeacles {| hel. Os) p. 2}. FACOM corporation uses 39 ky 30 
dots for Korean and Chinese characters and 24 by 30 dots for 


Roman alphabet, symbcls, and Korean alphabet (letters) ona 


jaser printer [Ref. 7: p. 471}. As cheaper dot matrix and 
ijaser printers find their way into the marketplace, the 
quaiity of characters will become less of an issue. 


Presently there are few problems with the quality of repre- 
senting Korean characters that cannot be solved through the 
additional expenditure of money. For the definition or each 
Characver ,; the authors have presented two alternatives; 
software and hardware (character generator). In order to 
increase speed and usability, a hardware-oriented character 
generator is best. Ley Cost and flexibility are the 
criterion, software-oriented character definition programs 


are CLerrer. 
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7. Memory Space for Character Definition 


= = —> a = a Se ce eee — a a a a a i 


To represent Korean and Chinese characters, one 
needs to code 5,000 characters for a character dertaaiiwaa 
(2,400 Korean characters; 1,800 Chinese characters; 8090 user 
definakle characters, Roman alphabet, oor symbols). I£ one 
uses a 24 bit by 24 bit matrix £Lont Size £Or cach Chagac- az, 
at least 360 K bytes are required (3 byte * 24 * 5,000) for 
character definition and 128 K bytes are reguired (64 X for 
16 bit address * 2 byte for an address of character defini- 
tion memory) for a look-up table. The total memory require- 
ment is 488 K bytes. 

A large memory space is required for the definition 
of the characters. Data compression of these characters can 
be considered for two different purposes: data transmission, 
and computer storage and output. Here one is mainly inter- 
ested in the latter case, where the main point is the total 
data amount to be stored. The method of data compressicn of 
Chinese characters can be classified by using the nmethod- 
ology listed in Table x) (Reto ls 0. 6200. 

There iS a problem associated with the enlargement 
and alignment of character patterns. The clarity of a char- 
acter depends on the size of the revroduction. Er a Large 
size 1S required the resolution must be high. Otherwise, 
stepwise zigzags appear which to some people are unbearable. 
Therefore, alli the patterns of different font sizes must be 
stored. This 1s uneconomical. Reproducing different char- 
acter sizes from the same data is desired. However, the 
enlargement and shrinking of character patterns frem a 
single set of data is quite difficult, because, if the addi- 
tion or the deletion of a bit by the interpolation is not 
done properly, it has a negative influence on the asthetics. 
Tn enlargement, the smoothness of an edge is particularly 
important, while in shrinking the gap between strokes must 


be carefully maintained. 
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TABLE XV 


Varieties of Data Compression Methods 


anes page unit EUn>Lengtnmcodi ng 
transmission { . 

Ghdracror UNI. two-dimensional 
pEeedictive coding 
coding by scan line 
Ppatternweunit 

dot pattern Cocina em by nN block 

representation ¢pattern unit 

stroke ; checker board sampling 

representation 

Scent ouUrscoo ES hexagonal board sampling 

RenoEy dinate coding 


PeCconStructione contour fol— 
lewing coding 


mathematical equation 
for strokes 


enlarge/shrink | synthesis from partial character 
patterns 


al gy $y, ee pe i es ei ye, — eS ae | 


| ee ee ee eT 


A comparative review of the options contained in 
Table XV with regard to determining memory size 1S very 
Gere Cul b. This is tecause the reguirements for character 
print qualities are quite different depending on each 
method. Simplicity in the hardware and software implementa- 
tion of the compressicn and reconstruction of characters is 
avery important consideration. Generally speaking high 
data compression methods need complex hardware and lcnger 
times for reconstruction. Therefore, the tradeoff to be 
considered is between the data compression ratio and the 
memory size. This represents the classic economic tradeoff 
Fetween the hardware/software cost with regard to the speed 


of character regeneration. 
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Because the price of the memory component is 
becoming less expensive the high-speed simple reconstruction 
method is preferred despite the necessarily large size 
memory. Many commercial machines have adopted this concert, 
and store the character dot patterns as they are without any 
data compression. For example, IBM machines use only a 12 by 
24 font size for Simple letters (Roman and Korean alphatet 
and symbols) instead of a 24 by 24 font [{Ref. 8: p. 17]. 
The FACOM machines use the software definition of the second 
level of Korean and Chinese characters which are not used 
frequently for data ccmpression [Ref. 7: p. 12]. Because of 
the reduction in the price of memory, the marketplace has 
shifted towards providing direct character storage, i.e a 


large memory, instead of utilizing data compression. 


uy 


V. EVALUATION OF SUGGESTZD METHODS 

The principal proklems in current editing technology for 
Chinese and Korean characters were detailed in Chapter III. 
Fundamentally, the problems cause user inconvenience, 
require lengthy input procedures, and result in complex 
update requirements. The authors! suggested methods will 
solve most problems which are encountered in Korean lanyuage 
data processing. More research and development remain in the 
following areas: 

First, in an enumeration method, there is ne problen 
except low readability to Koreans andthe inability to 
represent Chinese characters. In this case, Chinese charac- 
ters are ignored because Korean language data can be repre- 
sented through the use of only Korean characters without 
sericus problems. Low readability is caused by unfamiliarity 
and the unbalanced shape of each letter when written by an 
enumeration method. With a minor change of shape of the 
letters, this method will eliminate the above problems. 

Second, the 16-rit code for the three kinds of charac- 
ters requires the consideration of the following proklems: 

1. The 32 key tops are complex since each key tof repre- 
sents three or four letters and acronyms. One solu- 
ticn to this froblem is to use lighted, changeakle 
key tops which represent only one letter or acronyn 
at a moment according to the function keys and the 
order of keystrokes( i.e., Ist, 2nd, 3rd letter and 
acronym ). 

2. The user must remember whether a letter to Le tyred 
PomtneretEsSt., Second, Or third letter, and whether it 


is an acronym of a sound or a meaning character. 
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3. If a user does not know the meaning of a certain 
Chinese character to be typed, one must look up a 
table which shows all meanings and sounds of all 
pessible Chinese characters. 

4. In typing Korean characters which Consist er ont, 
first and second letters, a user has to hit the 
default key to make a 16-bit code. Instead of second 
letter and default keys, one can use twenty one nore 
second letter keys which generate 10-bit code as the 
second and third letters. Unfortunately, this wall 
make the keyboard more complex. 

5. Regardless of the authors’ analysis of sound and 
meaning characters and careful rearrangement of these 
codes, duplicate codes still exist. This is because 
of the irregularities caused by a natural evolution 
cf sound and meaning characters for over 2,000 years. 
Generally 3,000 or more Chinese characters will cause 
duplicate codes to increase proportionally. In order 
to eiiminate the duplicate codes, the Korean language 
ccmmittee needs to take measures to clarify the mean- 
ings of the Chinese characters that cause duplicate 
codes to exist. 

Before the actual construction of tne suggested syste, 
an economic (Cost/Benefit) analysis needs to be considered. 
Given the r % discount rate and the various yearly costs and 
benefits estimated by past data, Table XVI [Ref. 14] shows 
the following formula which can be used to derive the net 
present value of this project: This simply states that the 
het present value (NEV) 1s equal to the sum of the differ- 
ences between benefits (B) and costs (Ci =ianvcacheycare() moe 
the project life (T), divided by thevrellevaneecact cn ( = cos 
that year. The current estimate of the market size for word 
processing in Korea is $ 2.5 million annually (Korean Daily 


Times, Sep 10 1984). But this estimate will be in inverse 
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proporticn to the price of the system and will be in direct 
Froportion to the usefulness and the user-friendliness until 
maturation. 

In the above formula, the market price of the systen 
influences the benefits for manufacturers and ccsts' for 
users. This system is feasible when the net present values 
are poSitive for both manufacturers and users. If the 
benefits for manufacturers and the costs for uSers are 
constant in a system, the main problems will be: 

1. How to minimize the costs for manufacturers 

2. How to maximize the benefits for users 
To solve the above froblems, the best approach will te to 
make an efficient and user friendly system for Korean 
language data processing. This will increase the number (R) 
of systems sold, and increase the individual productivity o£ 
the users. 

There are many factors and constraints which cause high 


cost in implementing this method. Among these, the following 
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three factors affect the cost performance ratio for both 
manufacturers and users: 

1. The initial design cost: For this system, an organi- 
Zation has to invest initially fer )the de-s47  o. 
about 5,000 Korean and Chinese character patterns, 
and the system software ad hardware. As the numer 
of systems preduced by a manufacturer is increased, 
the unit cost of each system will be decreased as the 
costs are spread over more units. 

2. Cost for character generator: As mentioned in Chapter 
IV, one needs about 500 K bytes memory capacity for 
these character definitions. The cost of memory is 
decreasing and the speed is increasing as technology 
is being develcped. This cost 1S an amatiiatecocuar. 


users when buying a systen. 


3. Cost for hagdeony: One can consider three kinds of 
printer for hardcopy: dot matrix prinwer, chain 


printer, and laser printers “He 1s et practicaw coe 
use a chain frinter for our system since the chain 
will be approximately twenty meters long (_ 5,000 
Character * 4 mm per each character) and it would be 
prohibitively slow. Currently, dot matrix printers 
and iaser printers cost more than chain printers, but 
they are the only viable option. 

Among the three kinds of cost, the third one is the most 
serious since the cost o£ hardcopy is increasing as its use 
increases. Recently laser printers have become more popular 
cor these characters because cf the good quality, high speed 
and decreasing price. Comparatively though, laser printers 


are still relatively expensive. 
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VI. RECOMMENDATION AND CONCLUSIOS 

As the demand for data processing in Korea increases, 
users will continue to encounter more and more prcbiems in 
utilizing the Korean language for data processing. The 
current methods of convergence, display and select to imple- 
ment the Korean language in data processing must only be 
considered as interim measures due to their inefficient and 
time consuming means cf data entry. In order to prevent this 
problem from becoming more complicated due to the develcp- 
ment of various new implementations forwarded by independent 
research, a standardized system must be developed. 

This study examined two possible solutions for using the 
Korean language in data processing. The enumeration metnod 
is technologically feasible, inexpensive, easy to implement, 
but could not be used for applications within the Korean 
data processing environment. This is because it results ina 
textual form of Hangul that is unfamiliar to most Korean 
peorle. Therefore the current enumeration method is nota 
feasible sclution to the Korean data processing problen. 

The second method examined was based on a 16-kit ccde 
representation of Korean, Chinese characters, and the Roman 
alphaket. This methcd was found to possess all the advan-~ 
tages currentiv realized by the EBCDIC or ASCII code repre- 
Sentation of western countries. The only drawback to this 
system is that it might not be cost effective based on 
current technology. Hcwever, due to the rapid development of 
hardware and software technology, a cost effective means 
Should ke available within the next few years. 

In order to accelerate the determination of a thorough 
broad based solution to the Korean data processing frotlen, 


the Korean government must organize and charge a national 


4g 


level committee with the responsibility for investigating 
the preblem and determining a viable solution. This study 
with its proposal cf a l6>bat charac fermicolcucnou ame 
Frovided to that committee for further examination. This 
proposal represents a concept that could eventually lead to 
a long term viable sclution to the data entry and processing 


problems of using the Korean language. 
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LOAD COMMAND PROGRAM FOR CURRENT KEYBOARD 


LGatke es) 

Lee CG, 

1*A%000814622417F4141:3 1"1"0000080808080808; 
1"0"Q007022212121227C: T*m"QO00000000000007F;: 
L™E“OOTFS4O407C4S0407F;3 1=n*Q0007F0808080808; 
1"F"007F4040706404040; Ly" 00 00F- 22 eee CTE 
.#6%001E21404047211E; I™=x"QOO007TFOOTFSO4O7F; 
1 “H%004141617F416141: 1“2°000 07 EO 11-4 ) ioe: 
L"™I"Q0OZEQRBO08CB08083E: 1", “O0000CUGC0302040-; 
1%9"003E41414141413E; 1"."90000000001818; 
LYP"QOOTESIGITE404040: 1* >" O00 201896010618 20: 
177900 36E414141454230;3 1°" 000022 Taal. 2 2; 
1290076414175 444 20m 1™°"0000494977494977;3 
1“S"003E41403E90.413E; 1™"2"Q0000065087F0808; 
I™"T*O007F0808089089808:; 1™=3"0000997909997909:; 
1™U%00414141414141.3E: Pee" 0 OCT F 2g 2 255455 
1™W" 0041414169695 50: 1*3"0G6007714444446477; 
1?¥"0041221408980808; 1"€"0000502010; 
l"*a"00007F41414141 7F;3 LI" }"QOO0ELOLOT7OLOIODOE; 
l"d"00001C22416)2 71, Poe TUG 2h2eecote2s2 6, 
L™2"QQ00CTFESLO4S040G07F: I=$"QO0OSIJESB3ZEOQI3ZEDSB:;: 
let" Q0G07F 01071 Traore, Lez O0 Soe C4 eo 2 343°: 
l*s"QO000087F00 3E413E:;3 r™€"00060408; 

Ie s"QO000EOS Osta. T™ "000402 0101010204; 
re1 "0000202620202 Ee. 1") *#000CUGI0COICOCEEF: 
1"o"00CJ04 24 2627 -ee ere 1." =: UU 20 iso «0 2 0 1 Ce 
1"p"0000090909790909; Pere" 0.018 teueeo 18 13; 

lI '4"Q00041417F5141 7F ; 1"-"909000 7F; 
L"-"0O007F 01010 lumen « P™7"0006222236494949; 
1"s"™0600456404040407F:; PSO" OC MS 2 Soe oe 21, 
1™t"0000080808142241; P™"2"O0O03C42010E30407F;? 
l*u"QO00001L1FC1011F01:; 1°35 OO TR 2 $GiE Gls] 36; 
l™"w"Q00007F0808142241; 1™4"Q0060C14247F 0404; 
1" 7"Q90000 22222220 22 Pe 5° O0C7 S07 sees tte 
1"!'"0608080808080008:; 1" 6") Ute 607s S12 Lie; 
l"1"Q9G081829CR0F083E; Pee’ OO TE Glee aoe Pe Je S 
1™¢"0091061860189601; I"3"Q003E4I4S413E41413E;3 
L"™2"OCTEZ12Z13E21217£: 149 OMGs2 + 1360142 3c: 
L"C"O0iF214040460211E: Le -"0CUCCC UU UCM eI sie: 
Pr J 900 E Gee ecie viecwes ae Le Ose) 0C0L 78000 6. 
1°K"00434C5850584C43; PMA" OG2S2424; 

1’C%00 4640404040 CGE LPeN OC 00 T7 111i) Li li 22. 
1™M"00461535549414141; Le OU SES a S6 3b C630. 
1°N"0041615149456341; Ps_"O000COG7FOO7F; 
1°¥"90461412222141C08; Pee "O0S141FF 41495522: 
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APPENDIX G 

LOAD COMMAND PRKCGRAM FOR CAPITAL LETTER KEYBOARD 
1Ca*levei* ) 
1" "00; 
1"™!'"9000007F01010191020C; 17k"00000000444444704464545 
POVR OO OOOCOOREdd14224141; 1™1"00000C01017F09091151; 
1™$"000000083£483E0935£08;3 P™m"OO0000007TF1414141414; 
L°¥"J70000000007TFI022227F; rea OO COCO Z02Z3S02350202 | 
1"™°"90090000000018138: l"o"Q90000001L1111117F0101;3 
1”,"9090000999909579090509: 17®59"9009000040424277C0404; 
1™."9900000050525277D0525: 1%9%0040404040404040407F% 
1™7"90000009997S9S$79939)9S: Pere OC Teaco tT FOOTC 22412210; 
1"0"0C00600957D152550454S: ™5%9C010204C081422414141. 
PT" SOO FOlOLOLI1I1TII01IC1: L™=t"OC7TF 2424264 3445485946969; 
POE 6G 141461614614175°: L®@u"JODDDDDDDCLISIGI4SI4S7TFS 
Ae SO Oee 8006-14224 16 14.4) 4) 3 Ly 907 74446454 44464644677 5 
Pia OO EW Oo 2e2 4 262424 267TF: L™°w"OOG141LSGISGITIFSIGIGITF, 
L"S"QC4969469497TF454694697F3 Per OO Te2 25 14leleial22i1c. 
IA O0CO0040460407E20404C : I™y"000000404607C4C7C404C; 
1™7"900009000009080808087F; I™="OO7TFO1IGC1IC17TESO040407F; 
L"3"Q0000000900C00I07F: 1"£"900009301030501010107C; 
eo OC cOmOU1OTI101030579; I™C"™AODOITTFSZISOEOL4| 3e, 
Peer oCooo00017011212746141: 1™)"0000000490C14247F9404;3 
P72"O0O00COCOCTHS1I4G1I4S1417F; 1®="Q000003CS2510E30407E; 
L "AN QOCOO0O00007TF 40405040675; "190000010 22405E51211E: 
ea OR ee O71 51S tort + 54577: I"U"OCOOICTESOSESIIIGII:: 
L"C"QOD000007TFIOTFESOSCTF: Le 0000221475152 25 
ar ecm VO7FULISC1loO2c6C.: Boe OOO COO Ts le - 
ea MOC de NeOouTragls 2241; ir OOCOOO Tr: 
1F*99000007709)999112244; 1™:"9Q000008057F0808; 
Vee Oouomoy 7 151575454577; Loco oleolsec ls060l, 
Lao Oooo tT? 11il?i4i1 2: PO= "OOO eCEOUNU0GCOA1C, 
Vo" 90000 000404040406 97F : 1™>"0C201806010618-0; 
1" POW OOO re J07F00 3641323 1"?"0C3E“1JE05I8B JUNG. 
VS Oooo mn ooo609146224141 ; 1% 300364141 364141 3E; 
BOO OO (Ole uo32460 494 35 ere OO2bece 3 ob Ola2 3c: 
Dev 7 Od Omo00 721212254969; 1@L%900 354444 364546394 
im Meoodslaltrsisi7e: HemgGNG2245595t221C; 
Peron goal e 22541461221; 1"N"9008080808I80095. 
Pry ComoOogmG? 7 lO0ls7S4co77: PMO" GO7TFoic2d+331C eu. 
l"27"QOQ000000TFIJIIITEGOTE: 17P"9051520408102343; 
Tae EO UO UC4 Ger +0464 97 ° 1™0"0G10264+040+02010; 
iMaVOOTESGOGO60+05060607F; 1"\"9009460408; 
L"e"O00T7TFQO0007F+04040507F: Wee cOdsoc TlLOLOYC20S. 
Liao aloo SF 01010151: P"_"JOCOOIDTFOOTF: 
Par oo 1@00%h 191422414141; Wee oa l- 644555 22 5 
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